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TISSUE SPECIFIC GENES OF DIAGNOSTIC IMPORT 

TECHNICAL FIELD 

The present invention relates to a composition comprising a plurality of polynucleotides which 
5 are cell and/or tissue specific. These polynucleotides may be used to define and direct a metabolic or 
developmental process, to identify or to monitor the progression of a condition, disease, or disorder, or 
to evaluate and monitor the efficacy of a treatment protocol. 

BACKGROUND ART 
Array technology can provide a simple way to explore the expression of a single polymorphic 
10 gene or the expression profile of a large number of related or unrelated genes. When the expression of 
a single gene is examined, arrays are employed to detect the expression of a specific gene or its 
variants. When an expression profile is examined, arrays provide a platform for examining which 
genes are tissue specific, direct the differentiation of a cell type or tissue, carry out housekeeping 
functions, function as parts of a signaling cascade, or characterize a particular genetic predisposition, 
15 condition, disease, or disorder. 

The application of gene expression profiling is particularly relevant to improving diagnosis and 
prognosis of disease. However, in order to determine whether expression of a particular gene in a 
particular disease is significant, it is useful to provide a reference set of tissue and cell specific genes 
against which genes expressed during the disease process may be compared. For example, both the 
20 levels and sequences expressed in brain tumors may be compared with the levels and sequences 

expressed in normal brain tissue. These comparisons may be made on a single array by incorporating a 
particular tissue or cell specific reference set alongside novel sequences or on multiple arrays, each of 
which contains at least some subset of the known reference set 

The present invention satisfies a need in the art in that it provides such a reference set. The 
25 reference set may be used in its entirety or in part to produce an expression profile that may be used to 
define and direct a metabolic or developmental process, to identify or to monitor the progression of a 
condition, disease, or disorder, or to evaluate and monitor the efficacy of a treatment protocol. 

SUMMARY 

The present invention provides a plurality of tissue or cell specific polynucleotides which may 
30 be used on an array to produce an expression profile This profile may define expression of the 

polynucleotides in normal tissue, during a particular metabolic or developmental process or during the 
onset, progression, or treatment of a human condition, disease, or disorder. In one emrxxliment, these 
polynucleotides are selected from SEQ ID NOs:l-416. 

The invention also provides a plurality of polynucleotides which display tissue or cell specific 
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expression and are selected from: a) SEQ ID NOs:209-218 and 1-10, ceil specific polynucleotides of 
heart and fragments thereof; b) SEQ ID NOs:219-249 ami 11-41, cell specific polynucleotides of 
skeletal muscle and fragments thereof; c) SEQ ID NOs:250-25 land 42-43, cell specific polynucleotides 
of uterus and fragments thereof; d) SEQ ID NOs:252-256 and 4448, cell specific polynucleotides of 
5 ovary and fragments thereof; e) SEQ ID NOs:257-263 and 49-55, cell specific polynucleotides of 
stomach and fragments thereof; f) SEQ ID NOs:264-283 and 56-75, cell specific polynucleotides of 
intestine and fragments thereof; g) SEQ ID NOs:284-293 and 76-85, cell specific polynucleotides of 
lung and fragments thereof; h) SEQ ED NOs:294345 and 86-137, cell specific polynucleotides of liver 
and fragments thereof; i) SEQ ID NOs:346-356 and 138-148, cell specific polynucleotides of kidney 
10 and fragments thereof; j) SEQ ID NOs:357-374 and 149-166, cell specific polynucleotides of pancreas 
and fragments thereof; and k) SEQ ID NOs:375-416 and 167-208, cell specific polynucleotides of 
brain and fragments thereof. In one aspect, the plurality of polynucleotides are immobilized on a 
substrate. 

In another embodiment, the expression of a plurality of polynucleotides is used to detect 

15 expression in a tissue. In one aspect, the tissue is embryonic stem cells which are differentiating into 
brain, heart, kidney, liver, lung, muscle or pancreatic tissues. In a second aspect, the tissue is a biopsy 
from diseased brain, heart, kidney, liver, lung, muscle, ovarian, pancreatic, small intestine, stomach, or 
uterine tissues which is being diagnosed for a cancer or immune or inflammatory disease or subjected to 
forensic analysis. In a third aspect, the point of origin of a metastatic cancer is determined 

20 In another embodiment, the polynucleotides are used in high throughput methods of screening 

molecules or compounds to identify a ligand, the method comprising combining a polynucleotide with 
molecules or compounds under conditions to allow specific binding and detecting specific binding, 
thereby identifying a ligand which specifically binds to the polynucleotide. The molecules or 
compounds to be screened are selected from DNA molecules, RNA molecules, PNAs, mimetics, 

25 peptides, and proteins. 

In another embodiment, the invention provides a substantially purified polynucleotide selected 
from SEQ ID NOs:212, 228, 233, 259, 271, 287, 316-319, 324, 370, 379, 380, 383, 410, and 412 or 
a fragment thereof, SEQ ID NO:4, 20, 25, 51, 63, 79, 108-111, 116, 162, 171, 172, 175, 202, and 
204. In one aspect, the polynucleotide selected from SEQ ID NOs:NOs:212, 228, 233, 259, 271, 287, 

30 316-319, 324, 370, 379, 380, 383, 410, and 412 or a fragment thereof, SEQ ID NO:4, 20, 25, 51, 63, 
79, 108-1 1 1 , 1 16, 162, 171 , 172, 175, 202, and 204 is used in an expression vector transformed into a 
host cell to produce a protein or a portion thereof by culturing the host cell under conditions for the 
expression of protein and recovering the protein from the host cell culture. 

In a third embodiment, the invention provides a protein or a portion thereof. In one aspect, the 

35 protein is used in a high throughput method to screen large numbers of molecules or compounds to 



2 



WO 01/32927 PCT/US00/30396 
identify at least one ligand which specifically binds the protein, the method comprising combining the 
protein with the molecules or compounds under conditions to allow specific binding and detecting 
specific binding, thereby identifying a ligand which specifically binds the protein. In a second aspect, 
the protein is used to purify a ligand, the method comprising combining the protein with a sample under 
5 conditions to allow specific binding, recovering the bound protein, and separating the protein from the 
ligand, thereby obtaining purified ligand. The molecules or compounds screened or purified may be 
selected from DNA molecules, RNA molecules, PNAs, mimetics, peptides, proteins, agonists, 
antagonists, antibodies or their fragments, immunoglobulins, inhibitors, drug compounds, and 
pharmaceutical agents. Any of these molecules or compounds may have diagnostic or therapeutic 
10 applications. 

DESCRIPTION OF THE SEQUENCE LISTING AND TABLES 

A portion of the disclosure of this patent document contains material that is subject to 
copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of 
the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent 
15 file or records, but otherwise reserves an copyright rights whatsoever. 

The Sequence Listing is a compilation of polynucleotides obtained by sequencing and extension 
of clone inserts of different cDNAs. Each sequence is identified by a sequence identification number 
(SEQ ID NO or SEQ ID) and by the clone number (Incyte ID) from which it was obtained. 

Table 1 lists the fragments and extended polynucleotides by their SEQ ID NO and cDNA 
20 respectively, tissue, and by the description associated with at least a fragment of a homologous 
polynucleotide in GenBank. The descriptions were obtained using the sequences of the Sequence 
Listing and BLAST analysis. 

Table 2 lists the source of the RNAs used to produce target polynucleotides for hybridization 
to the UNIGEM V microarray (Incyte Genomics, Palo Alto CA). The columns present the Source No, 
25 Tissue, Age, Ethnicity/Sex, Cause of Death, and Conditions or Diseases, as known for each donor. 
Table 3 shows the data for each of the clones across each of the tissues used in the 
experiments. The columns present Clone ID and the tissues (with source number)-heart, skeletal 
muscle, uterus, stomach, small intestine, lung, liver, kidney, pancreas, spleen and brain. This data was 
produced using GEMTOOLS software (Incyte Genomics). 
30 Table 4 presents the analysis of variance (ANOVA) far the data. The columns present Clone 

ID, Var. Betw (variance between), Van Within (variance within), F (value), and Probability. These 
values were produced using batch ANOVA (Sokal and Rohlf (1969) Biometry; the Principles and 
Practice of Statistics in Biological Research. WH Freeman, San Francisco CA) and EXCEL98 
software (Microsoft, Seattle WA). 
35 Table 5 shows the cell and tissue specificity of the polynucleotides across tissues (heart, 
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skeletal muscle* uterus, stomach, small intestine, lung, liver, kidney, pancreas, spleen and brain). The 
cell and tissue specific groupings were produced using mean values [mean (tissue)- mean (entire set)] 
and grouped using EXCEL98 software (Microsoft). 

DESCRIPTION OF THE INVENTION 

5 Definitions 

The term "array" refers to an ordered arrangement of hybridizable polynucleotides. These are 
arranged so that there are a ''plurality" of polynucleotides, preferably at least one polynucleotide, 
preferably at least 100 polynucleotides, and more preferably at least 1,000 polynucleotides, and even 
more preferably at least 10,000 polynucleotides on a 1 cm 2 substrate. The maximum number of 
10 polynucleotides is unlimited, but is at least 100,000. Furthermore, the signal from each of the 
hybridized polynucleotides is individually distinguishable. 

A "polynucleotide" refers to a chain of nucleotides. Preferably, the chain has from about 15 to 
10,000 nucleotides and more preferably from about 400 to 6,000 nucleotides. The term "probe" refers 
to a probe polynucleotide capable of hybridizing with a target polynucleotide to form a hybridization 
15 complex. In most instances, the sequences of the probe and target polynucleotides wfll be 

complementary (no mismatches) when aligned. In some instances, there may be up to a 10% mismatch. 

"Fragment" refers to any part of an Incyte clone or polynucleotide which retains a useful 
characteristic. Useful fragments may be used in hybridization technologies, to identity or purify- 
ligands, or as a therapeutic to regulate replication, transcription or translation. 
20 "Ligand" refers to any agent, molecule, or compound which will bind specifically to a 

complementary site on a polynucleotide or protein. Such ligands stabilize or modulate the activity of 
polynucleotides or proteins and may be composed of at least one of the following: inorganic and organic 
substances including nucleic acids, proteins, carbohydrates, fats, and lipids. 

"Purified" refers to any molecule or compound that is removed, isolated, or separated from its 
25 natural environment and is at least about 60% free, and more preferably about 90% free, from other 
components with which it is naturally associated. 

"Specific binding" refers to a special and precise interaction between two molecules which is 
dependent upon a particular structure such as molecular side groups. For example, the hydrogen 
bonding between two single stranded nucleic acids or die binding between an epitope or a protein and 
30 an agonist, antagonist, or antibody. 

"Sample" is used in its broadest sense. A sample containing polynucleotides may comprise a 
bodily fluid; an extract from a cell, chromosome, organelle, or membrane isolated from a cell; genomic 
DNA, RNA, or cDNA in solution or bound to a substrate; a cell; a tissue; a tissue print; a finger print, 
a hair, and the like. 

35 ''Portion" refers to any part of a protein used for any purpose, but especially for the screening 
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of molecules or compounds to identify those which specifically bind to that portion and for producing 
antibodies. 

The phrase "polynucleotide encoding a protein" refers to nucleic acid sequence that closely 
aligns with a sequence which encodes a conserved protein motif or domain that were identified by 
5 employing analyses well known in the art These analyses include Hidden Markov Models (HMMs) 
such as PFAM (Krogh (1994) J Mol Biol 235:1501-1531; Sonnhamer et al. (1988) Nucl Acids Res 
26:320-322), BLAST (Basic Local Alignment Search Tool; Altschul (1993) J Mol Evol 36: 290-300; 
and Altschul et al. (1990) J Mol Biol 215:403-410), or other analytical tools such as BLIMPS 
(Henikoff et al. (1998) Nucl Adds Res 26:309-12). Additionally, "polynucleotide encoding a protein" 
10 may refer to a polynucleotide that is expressed in or associated with specific human metabolic 
processes, conditions, disorders, or diseases. 

"Cell specific", as defined herein, refers to those polynucleotides which occur at a statistically 
significant level in more than one tissue. Hie commonality between the tissues may be ascribed to the 
types of cells that are an integral part of or would be expected to be found in a particular tissue, e. g. , 
15 blood cells, nerve cells, endothelial cells, and the like 
The Invention 

The present invention provides a plurality of tissue or cell specific polynucleotides which may 
be used on an array to produce an expression profile. This profile may define expression of these 
polynucleotides in normal tissue, during a particular metabolic or developmental process or during the 

20 onset, progression, or treatment of a human condition, disease, or disorder. These polynucleotides 
represent known and novel genes normally expressed in the cdls or tissues of the brain, heart, intestine, 
kidney, liver, lung, smooth muscle, ovary, pancreas, spleen, stomach, or uterus. The expression of 
these polynucleotides may be compared to the expression of other known or novel genes found on an 
array. The plurality of polynucleotides, the entire reference set, comprises SEQ ED NOs:l-416. Tissue 

25 or cell-specific reference sets may be selected from SEQ ID NOs:209-218 and 1-10, cdl specific 
polynucleotides of heart and fragments thereof; b) SEQ ID NOs:219-249 and 1 1-41, ceil specific 
polynucleotides of skeletal muscle and fragments thereof; c) SEQ ID NOs:250-251 and 42-43, cell 
specific polynucleotides of uterus and fragments thereof; d) SEQ ED NOs:252-256 and 44-48, cdl 
specific polynucleotides of ovary and fragments thereof; e) SEQ ID NOs:257-263 and 49-55, cdl 

30 specific polynucleotides of stomach and fragments thereof; f) SEQ ID NOs:264-283 and 56-75, cdl 
specific polynucleotides of intestine and fragments thereof; g) SEQ ID NOs:284-293 and 76-85, cdl 
specific polynucleotides of lung and fragments thereof; h) SEQ ID NOs:294-345 and 86-137, cell 
specific polynucleotides of liver and fragments thereof; i) SEQ ID NOs:346-356 and 138-148, cdl 
specific polynucleotides of kidney and fragments thereof; j) SEQ ID NOs:357-374 and 149-166, cdl 

35 specific polynucleotides of pancreas and fragments thereof; and k) SEQ ID NOs:375-416 and 167-208, 
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cell specific polynucleotides of brain and fragments thereof. The plurality of polynucleotides is 
arrayed on a substrate, preferably a microarray or used as probes. 

The invention also provides a substantially purified polynucleotide selected from SEQ ID 
NOs:212, 228, 233, 259, 271, 287, 316-319, 324, 370, 379, 380, 383, 410, and 412 or a fragment 
5 thereof, SEQ ID NO:4, 20, 25, 51, 63, 79, 108-111, 1 16, 162, 171, 172, 175, 202, and 204. These 
polynucleotides may be used in an expression vector transformed into a host cell to produce a protein or 
a portion thereof by culturing the host cell under conditions for the expression of protein and recovering 
the protein from the host cell culture. 

The microarray can be used for large scale genetic or gene expression analysis of a large 

10 number of novel target polynucleotides. These targets are prepared by methods well known in the art 
and are from mammalian cells or tissues which are in a certain stage of development or differentiation; 
have been treated with a known molecule or compound, such as a cytokine, growth factor, a drug, and 
the like; or have been extracted or biopsied from a mammal with a known or unknown condition, 
disorder, or disease before or after treatment. Specifically, the plurality of polynuleotides are useful to 

15 determine the differentiation of embryonic stem cells toward brain, heart, kidney, liver, lung, muscle or 
pancreatic tissues or to determine whether a cancer is metastatic or its source by analyzing biopsied 
tissue from diseased brain, heart, kidney, liver, lung, muscle, ovarian, pancreatic, small intestine, 
stomach, or uterine tissues. The plurality of polynucleotides may be used during the diagnosis of a 
cancer, an immunopathology, a neuropathology, and the like. 

20 The target polynucleotides are hybridized to the probe polynucleotides for the purpose of 

defining a novel gene profile associated with that developmental stage, treatment, condition, disorder or 
disease. Subsequently, the gene profile can be used for diagnosis, prognosis, or monitoring of 
treatments where altered expression of known and novel genes is associated with a cancer, an 
immunopathology, a neuropathology, and the like. In some cases, a gene profile can be used to 

25 investigate an individual's predisposition to a condition, disorder or disease such as a cancer, an 
immunopathology, a neuropathology, and the like. 

When the polynucleotides of the invention are employed as hybridizable polynucleotides on a 
microarray, the polynucleotides are organized in an ordered fashion so that each polynucleotide is 
present at a specified location on the substrate. Because the probe polynucleotides are at specified 

30 locations on the substrate, their hybridization patterns and intensities can be compared with the 

hybridization patterns and intensities of other known and novel polynucleotides to create an expression 
profile Such a profile, interpreted in terms of expression levels of the cell and tissue specific, known, 
and novel genes can be correlated with a particular metabolic process, developmental stage, treatment, 
condition, disorder, disease, or stage of disease. 

35 The plurality of polynucleotides can also be used to identify or purify a molecule or compound 
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which specifically binds to at least one of the polynucleotides. These molecules may be identified from 
a sample or in high throughput mode from a large number of molecules and compounds including 
mRNAs, cDNAs, genomic fragments, and the like. Typically, the molecules or compounds will be of 
particular diagnostic or therapeutic interest. 
5 If nucleic acid molecules in a sample enhance the hybridization background, it may be 

advantageous to remove the offending molecules. One method for removing such molecules is by 
hybridizing the sample with immobilized probe polynucleotides and washing away those molecules that 
do not form hybridization complexes. At a later point, hybridization complexes can be dissociated, 
thereby releasing those molecules which specifically bind the probe polynucleotides. 

10 Method for Selecting Polynucleotide Probes 

There are numerous different ways to select polynucleotides. Some of the more common 
ones include selecting probes from genes which are well known in the literature to have an 
association with a particular condition, disorder, or disease, which have a common functional 
characteristic such as the presence of a particular motif or domain or a signal peptide, which are 

15 expressed in a particular cell type or tissue such as blood or bone marrow, and the like. 

Preferably, the probes are non-redundant; therefore, no more than one probe represents a 
particular gene. Control sequences, however, may be selected specifically for their redundancy. 

Polynucleotides of the composition may be manipulated to optimize their performance in* 
hybridization technologies. Polynucleotide selection may be optimized by examining the sequences 

20 using a computer algorithm to identify fragments lacking potential secondary structure. Computer 
algorithms such as those employed in Vector NTI software (Informax, N. Bethesda MD) or 
LASERGENE software (DNASTAR, Madison WI) are well known in the art These programs search 
nucleic acid sequences to identify stem loop structures and tandem repeats and to analyze G+C content 
of the sequence. In mammalian arrays, those sequences with a G+C content greater than 60% may be 

25 excluded Alternatively, polynucleotides can be optimized under experimental conditions to determine 
whether polynucleotide probes and their complementary targets hybridize optimally. 

Where the greatest numbers of non redundant polynucleotides are desired, the polynucleotides 
may be compared with clustered or assembled sequences to assure that each polynucleotide is derived 
from a different gene. To obtain a longer or different probe for a particular gene, the polynucleotide 

30 may be physically extended utilizing the partial nucleotide sequences derived from the Incyte clone and 
employing the XL-PCR kit (Applied Biosystems, Foster City CA) or other means known in the art. 
Polynucleotide Probes 

Polynucleotide probes can be genomic DNA or cDNA or mRNA, or any RNA-like or 
DNA-like material, such as peptide nucleic acids, branched DNAs and the like. They may be the sense 

35 or antisense strand. Where targets are double stranded, probes may be either sense or antisense 
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strands. Where targets are single stranded, probes are complementary single strands. 

In one embodiment, polynucleotide probes are cDNAs. The size of the cDN As may vary and 
is preferably from 15 to 10,000 nucleotides, more preferably from 60 to 4000 nucleotides, and most 
preferably from 200-600 nucleotides. 
5 In another embodiment, probes are plasmids. In this case, the cDNA sequence of interest is the 

insert sequence. Excluding the vector DNA and regulatory sequences, cDNA size may vary preferably 
from 15 to 10,000 nucleotides, more preferably from 60 to 4000 nucleotides, and most perferably from 
200-600 nucleotides. 

Polynucleotide probes can be prepared by a variety of synthetic or enzymatic methods well 
10 known in the art Probes can be synthesized, in whole or in part, using chemical methods well known in 
the art (Caruthers et al. (1980) Nucleic Acids Symp Ser (7):215-233). Alternatively, probes can be 
produced enzymatically or recombinantly, by in vitro or in vivo transcription. 

Nucleotide analogues can be incorporated into the probes by methods well known in the art 
The only requirement is that the incorporated nucleotide analogues of the probe must base pair with 
15 target nucleotides. For example, certain guanine nucleotides can be substituted with hypoxanthine 
which base pairs with cytosine residues. However, these base pairs are less stable than those between 
guanine and cytosine. Alternatively, adenine nucleotides can be substituted with 2,6-diaminopurine 
which can form stronger base pairs than those between adenine and thymidine. 

Additionally, probes can include nucleotides that have been derivatized chemically or 
20 enzymatically. Typical chemical modifications include deri vatization with acyl, alkyl, aryl or amino 
groups. 

Probes can be synthesized on a substrate. Synthesis on the surface of a substrate may be 
accomplished using a chemical coupling procedure and a piezoelectric printing apparatus as described 
by Baldeschweiler et al. (PCT /W095/25 1116). Alternatively, the probe can be synthesized on a 

25 substrate surface using a self-addressable electronic device that controls when reagents are added as 
described by Heller et at (USPN 5,605,662). 

Complementary DNA (cDNA) can be arranged and then immobilized on a substrate. Probes 
can be immobilized by covalent means such as by chemical bonding procedures or UV. In one such 
method, a cDNA is bound to a glass surface which has been modified to contain epoxide or aldehyde 

30 groups. In another case, a cDNA probe is placed on a polylysine coated surface and then UV 

cross-linked as described by Shalon et al. (PCT/WO95/35505; incorporated herein by reference). In 
yet another method, a DNA is actively transported from a solution to a given position on a substrate by 
electrical means (Heller et al. supra) . Alternatively, probes, clones, plasmids or cells can be arranged 
on a filter. In the latter case, cells are lysed, proteins and cellular components degraded, and the DNA 

35 is coupled to the filter by UV cross-linking. 
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Furthermore, probes do not have to be directly bound to the substrate, but rather can be bound 
to the substrate through a linker group. The linker groups are typically about 6 to 50 atoms long to 
provide exposure of the attached probe. Preferred linker groups include ethylene glycol oligomers, 
diamines, diacids and the like. Reactive groups on the substrate surface react with a terminal group of 
5 the linker to bind the linker to the substrate. The other terminus of the linker is then bound to the probe. 

Probes can be attached to a substrate by sequentially dispensing reagents for probe synthesis 
on the substrate surface or by dispensing preformed DNA fragments to the substrate surface. Typical 
dispensers include a micropipette delivering solution to the substrate with a robotic system to control 
10 the position of the micropipette with respect to the substrate. There can be a multiplicity of dispensers 
so that reagents can be delivered to the reaction regions efficiently. 
Sample Preparation 

In order to conduct sample analysis, a sample containing targets is provided The samples can 
be any sample containing targets and obtained from any bodily fluid (blood, urine, saliva, phlegm, 

15 gastric juices, etc.), cultured cells, biopsies, or other tissue or forensic preparations. 

DNA or RNA can be isolated from a sample according to any of a number of methods well 
known to those of skill in the art For example, methods of purification of nucleic acids are described 
in Tljssen (1993) Laboratory Techniques in Biochemistry and Molecular Biology: Hybridization With 
Nucleic Acid Probes. Part I. Theory and Nucleic Acid Preparation , Elsevier Science, New York NY). 

20 In one case, total RNA is isolated using TRIZOL reagent (Life Technologies, Gaithersburg MD), and 
mRNA is isolated using oligo d(T) column chromatography or glass beads. In one alternative, when 
targets are derived from an mRNA, targets can be a DNA reverse transcribed from an mRNA, an RNA 
transcribed from that DNA, a DNA amplified from that DNA, an RNA transcribed from the amplified 
DNA, and the like. When target is derived from DNA, target can be DNA amplified from DNA, or 

25 RNA reverse transcribed from DNA. In yet another alternative, targets are prepared by more than one 
method. 

When targets are amplified it is desirable to amplify the nucleic acids in the sample and to 
maintain their relative abundances, including low abundance transcripts. Total mRNA can be 
amplified by reverse transcription using a reverse transcriptase and a primer consisting of oligo d(T) 

30 and a sequence encoding the phage T7 promoter to provide a single stranded DNA template. The 
second DNA strand is polymerized using a DNA polymerase and an RNAse which assists in breaking 
up the DNA/RN A hybrid After synthesis of the double stranded DNA T7 RNA polymerase can be 
added, and RNA transcribed from the second DNA strand template as described by Van Gelder et al. 
(USPN 5,545,522). RNA can be amplified in vitro, in situ or in vivo (Eberwine, USPN 5,5 14,545). 

35 It is also advantageous to include quantitation controls to assure that amplification and labeling 
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procedures do not change the true abundance of transcripts in a sample. For this purpose, a sample is 
spiked with a known amount of control nucleic acid, and the probes include control probes which 
specifically hybridize with the control nucleic acid. After hybridization and processing, the 
hybridization signals should reflect accurately the amounts of control nucleic acid added to the sample. 
5 Prior to hybridization, it may be desirable to fragment the nucleic acids of the sample. 

Fragmentation improves hybridization by minimizing secondary structure and cross-hybridization 
among the nucleic acids in the sample or with noneomplementary probes. Fragmentation can be 
performed by mechanical or chemical means. 

The nucleic acids may be labeled with one or more labeling moieties to allow for detection and 

10 quantitation of hybridization complexes. The labeling moieties can include compositions that can be 
detected by spectroscopic, photochemical, biochemical, biodectronic, immunochemical, electrical, 
optical or chemical means. The labeling moieties include radioisotopes, such as 32 P, 33 P or 35 S; 
chemiluminescent compounds, labeled binding proteins, heavy metal atoms, spectroscopic markers such 
as fluorescent markers and dyes; magnetic labels, linked enzymes, mass spectrometry tags, spin labels, 

15 electron transfer donors and acceptors, and the like. 

Exemplary dyes include quinoline dyes, triarylmethane dyes, phthaldns, azo dyes, cyanine 
dyes, and the like. Preferably, fluorescent markers absorb light above about 300 nm, more preferably 
above 400 nm, and usually emit light at wavelengths at least greater than 10 nm above the wavelength 
of the light absorbed Preferred fluorescent markers include fluorescein, phycoerythrin, rhodamine, 

20 lissamine, and Cy3 and Cy5. 

Labeling can be carried out during an amplification reaction, such as polymerase chain and in 
vitro transcription reactions; by nick translation, or by 5* or 3 '-end-labeling reactions. In one case, 
labeled nucleotides are used in an in vitro transcription reaction. When the label is incorporated after 
or without an amplification step, the label is incorporated either by using a terminal transferase or a 

25 kinase on the 5 ' end of the target polynucleotide and then incubating overnight with a labeled 
oligonucleotide in the presence of T4 RNA ligase. 

Alternatively, the labeling moiety can be incorporated after hybridization once a probe/target 
complex has formed In one case, biotin is first incorporated during an amplification step as described 
above. After the hybridization reaction, unbound nucleic acids are rinsed away so that the only biotin 

30 remaining bound to the substrate is that attached to targets that are hybridized to probes. Then, an 
avidin-conjugated fluorophore, such as avidin-phycoerythrin, that binds with high affinity to biotin is 
added In another case, the labeling moiety is incorporated by intercalation into preformed target/probe 
complexes. In this case, an intercalating dye such as a psoralen-linked dye can be employed 
Screening Assays 

35 Probes or polynucleotides may be used to screen a library of molecules or compounds for 
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specific binding affinity. The libraries may be DNA molecules, RNA molecules, PNAs, peptides, 
proteins such as transcription factors, enhancers, repressors, and other organic or inorganic ligands 
which regulate activities such as replication, transcription, or translation of polynucleotides in the 
biological system. The assay involves combining the probe with the library of molecules or compounds 
5 under conditions that allow specific binding, and detecting specific binding to a ligand which 
specifically binds the probe. 

Similarly, a protein or a portion thereof transcribed and translated from a probe may be used to 
screen libraries of molecules or compounds in any of a variety of screening assays. The protein or 
portion thereof may be free in solution, affixed to an abiotic or biotic substrate, borne on a cell surface, 
10 or located intracellular^. Specific binding between the protein and a ligand may be measured 
Depending on the kind of library being screened, the assay may be used to identify DNA, RNA, or 
PNAs, agonists, antagonists, antibodies, immunoglobulins, inhibitors, mimetics, peptides, proteins, 
drugs , or any other ligand, that specifically binds the protein. 
Purification of Ligand 

15 Probes may be used to purify a ligand from a sample. A method for using a probe to purify a 

ligand would involve combining the probe with a sample under conditions to allow specific binding, 
detecting specific binding, recovering the bound protein, and using an appropriate agent to separate the 
polynucleotide from the purified ligand. 

Similarly, the encoded protein or a portion thereof may be used to purify a ligand from a 

20 sample. A method for using a proton or a portion ttoeof to purify a ligand would involve combining 
the protein or a potion thereof with a sample under conditions to allow specific binding, detecting 
specific binding between the protein and ligand, recovering the bound protein, and using an appropriate 
agent to separate the protein from the purified ligand. 
Hybridization and Detection 

25 Hybridization causes a denatured polynucleotide probe and a denatured complementary target 

to form a stable duplex through base pairing. Hybridization methods are well known to those skilled in 
the art (See Ausubd, supra, units 2.8-2.1 1, 3.18-3.19 and 4.6-4.9.) Conditions can be selected for 
hybridization where completely complementary probe and target can hybridize, i.e., each base pair 
must interact with its complementary base pair. Alternatively, conditions can be selected where probe 

30 and target have mismatches of up to about 10% but are still able to hybridize. Suitable conditions can 
be selected by varying the concentrations of salt in the prehybridization, hybridization, and wash 
solutions or by varying the hybridization and wash temperatures. With some substrates, temperature 
can be decreased by adding formamide to the prehybridization and hybridization solutions. 

Hybridization can be performed at low stringency with buffers, such as 5xSSC with 1 % 

35 sodium dodecyl sulfate (SDS) at 60°C, which permits hybridization between probe and target 
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sequences that contain some mismatches to form probe/target complexes. Subsequent washes are 
performed at higher stringency with buffers sue* as 0.2xSSC with 0. 1 % SDS at either 45 °C (medium 
stringency) or 68 °C (high stringency), to maintain hybridization of only those probe/target complexes 
that contain completely complementary sequences. Background signals can be reduced by the use of 
5 detergents such as SDS, Sarcosyl, or TRITON X-100 (Sigma-Aldrich, St Louis MO) or a blocking 
agent, such as salmon sperm DNA. 

Hybridization specificity can be evaluated by comparing the hybridization of control probe to 
target sequences that are added to a sample in a known amount. The control probe may have one or 
more sequence mismatches compared with the corresponding target In this manner, it is possible to 

10 evaluate whether only complementary probes are hybridizing to the targets or whether mismatched 
hybrid duplexes are forming. 

Hybridization reactions can be performed in absolute or differential hybridization formats. In 
the absolute hybridization format, probes from one sample are hybridized to microarray probes, and 
signals detected after hybridization complexes form. Signal strength correlates with probe levels in a 

15 sample. In the differential hybridization format, differential expression of a set of genes in two 
biological samples is analyzed. Probes from the two samples are prepared and labeled with different 
labeling moieties. A mixture of the two labeled targets is hybridized to the microarray probes, and 
signals are examined under conditions in which the missions from the two different labels are 
individually detectable Targets in the microarray that are hybridized to substantially equal numbers of 

20 probes derived from both biological samples give a distinct combined fluorescence (Shalon, 

PCT/WO95/35505). In a preferred embodiment, the labels are fluorescent labels with distinguishable 
emission spectra, such as a lissamine conjugated nucleotide analog and a fluorescein conjugated 
nucleotide analog. In another embodiment Cy3 and Cy5 fluorophores (Amersham Pharmacia Biotech, 
Piscataway NJ) are employed. 

25 After hybridization, the microarray is washed to remove nonhybridized polynucleotides, and 

complex formation between the hybridizable array probes and the targets is examined. Methods for 
detecting complex formation are well known to those skilled in the art In a preferred embodiment, the 
probes are labeled with a fluorescent label, and measurement of levels and patterns of fluorescence 
indicative of complex formation is accomplished by fluorescence microscopy, preferably confocal 

30 fluorescence microscopy. An argon ion laser excites the fluorescent label, emissions are directed to a 
photomultiplier, and the amount of emitted light is detected and quantitated. The detected signal should 
be proportional to the amount of probe/target complexes at each position of the microarray. The 
fluorescence microscope can be associated with a computer-driven scanner device to generate a 
quantitative two-dimensional image of hybridization intensity. The scanned image is examined to 

35 determine the abundance/expression level of hybridized probe. 



12 



WO 01/32927 PCT/USOO/30396 
Typically, microarray fluorescence intensities can be normalized to take into account variations 
in hybridization intensities when more than one microarray is used under similar test conditions. In a 
preferred embodiment, individual polynucleotide probe/target complex hybridization intensities are 
normalized using the intensities derived from internal normalization controls contained on each 
5 microarray. 

Expression Profiles 

This section describes an expression profile using the polynucleotides of this invention. The 
reference set can be used as part of a expression profile which detects changes in the expression of 
novel genes whose transcripts are modulated in a particular metabolic response, treatment, condition, 

10 disorder, or disease. These genes will include genes whose altered expression is correlated with a 
cancer, an immunopathology, a neuropathology, and the like. 

The expression profile comprises a plurality of detectable hybridization complexes. Each 
complex is formed by hybridization of one or more probes to one or more complementary targets. At 
least one of the probes, preferably a plurality of probes, is hybridized to a complementary target 

15 forming, at least one and preferably, a plurality of complexes. A complex is detected by incorporating 
at least one labeling moiety. The expression profiles provide "snapshots" that can show unique 
expression patterns that are characteristic of a metabolic process, treatment, condition, disorder or 
disease. 

After performing hybridization experiments and detecting signals from a microarray, particular 
20 probes can be identified and selected based on their expression patterns. Such probes can be used to 

clone a full length sequence for the gene, to screen a library for a closely related homolog, to screen for 

or purify ligands, or to produce a protein. 

Utility of the Invention 

The plurality of polynucleotides can be used as hybridizable elements in a microarray. Such a 
25 microarray can be employed in several applications including diagnostics, prognostics and treatment 

regimens, and drug discovery and development for conditions, disorders, and diseases such as cancer, 

an immunopathology, a neuropathology and the like. 

Expression Profiles 

In one situation, the microarray is used to monitor the progression of disease. The differences 
30 in gene expression between healthy and diseased tissues or cells can be assessed and cataloged. By 
analyzing changes in patterns of gene expression, disease can be diagnosed at earlier stages before the 
patient is symptomatic. The invention can be used to formulate a prognosis and to design a treatment 
regimen. The invention can also be used to monitor the efficacy of treatment. For treatments with 
known side effects, the microarray is employed to "fire tune" the treatment regimen. A dosage is 
35 established that causes a change in genetic expression patterns indicative of successful treatment 
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Expression patterns associated with the onset of undesirable side effects are avoided. This approach 
may be more sensitive and rapid than waiting for the patient to show inadequate improvement, or to 
manifest side effects, before altering the course of treatment 

Alternatively, animal models which mimic a human disease can be used to characterize 
5 expression profiles associated with a particular condition, disorder or disease or the treatment of the 
condition, disorder or disease. Experimental treatment regimens may be tested in these animal models 
using microarrays to establish and then follow expression profiles over time. In addition, microarrays 
may be used with cell cultures or tissues removed from animal models to rapidly screen large numbers 
of candidate drug molecules, looking for ones that produce an expression profile similar to those of 
10 known therapeutic drugs, with the expectation that molecules with the same expression profile will 
likely have similar therapeutic effects. Thus, the invention provides the means to rapidly determine the 
molecular mode of action of a drug. 
Embryonic Stem Cells 

Embryonic (ES) stem cells isolated from rodent or human embryos retain the potential to form 

15 embryonic tissues. When ES cells such as the mouse 129/SvJ cell line are placed in a blastocyst from 
the C57BL/6 mouse strain, they resume normal development and contribute to tissues of the live-bom 
animal. ES ceDs are preferred for use in the creation of experimental knockout and knockin animals. 
In mice> the method for this process is well known in the art and the steps are: the cDNA is introduced 
into a vector, the vector is transformed into ES cells, transformed cells are identified and microinjected 

20 into mouse cell blastocysts, blastocysts are surgically transferred to pseudopregnant dams. The 
resulting chimeric progeny are genotyped and bred to produce heterozygous or homozygous strains. 

ES cells are also used for the treatment of victims of Parkinson's disease, stroke, and other 
neuropathologies (The Scientist* 14{18): Iff; September 2000). Pharmaceutical companies are also 
targeting disorders of the liver, kidney* and pancreas, specifically alpha- 1 antitrypsin, polycystic kidney 

25 disease, and diabetes, respectively. In time, traumatic damage to the nervous system and internal 
organs may also be treated by transplantation of cells or organs which are differentiated from 
embryonic stem cells. The present invention may be used to characterize the developmental pathways 
of the differentiation processes that give rise to brain, heart, kidney, liver, lung, muscle, ovarian, 
pancreatic, small intestine, stomach, or uterine tissues. 

30 Knockout Analysis 

In gene knockout analysis, a region of a gene is enzymatically modified to include a non-natural 
intervening sequence such as the neomycin phosphotransferase gene (neo; Capecchi (1989) Science 
244:1288-1292). The modified gene is transformed into cultured ES cells and integrates into the 
endogenous genome by homologous recombination. The inserted sequence disrupts transcription and 
35 translation of the endogenous gene. 
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Knockin Analysis 

ES cells can be used to create knockin humanized animals or transgenic animal models of 
human diseases. With knockin technology, a region of a human gene is injected into animal ES cells, 
and the human sequence integrates into the animal cell genome. Transgenic progeny or inbred lines are 
5 studied and treated with potential pharmaceutical agents to obtain information on the progression and 
treatment of the analogous human condition. 

As described herein, the uses of the cDN As, provided in the Sequence Listing of this 
application, and their encoded proteins are exemplary of known techniques and are not intended to 
reflect any limitation on their use in any technique that would be known to the person of average skill in 

10 the art Furthermore, the cDNAs provided in this application may be used in molecular biology 
techniques that have not yet been developed, provided the new techniques rely on properties of 
nucleotide sequences that are currently known to the person of ordinary skin in the art, e.g., the triplet 
genetic code, specific base pair interactions, and the like. Likewise, reference to a method may include 
combining more than one method for obtaining, assembling or expressing cDNAs that will be known to 

15 those skilled in the art It is also to be understood that this invention is not limited to the particular 
methodology, protocols, and reagents described, as these may vary. It is also understood that the 
terminology used herein is for the purpose of describing particular embodiments only, and is not 
intended to limit the scope of the present invention which will be limited only by the appended claims. 
The examples below are provided to illustrate the subject invention and are not included for the purpose 

20 of limiting the invention. 

EXAMPLES 

For purposes of example, the preparation and sequencing of the BRAINON01 cDNA library is 
described. Preparation and sequencing of other cDNAs in libraries in the LIFESEQ database (Incyte 
Genomics) have varied over time, and the gradual changes involved use of kits, plasmids, and 
25 machinery available at the particular time the library was made and analyzed. 
I cDNA Library Construction 

The BRAINON01 normalized cDNA library was constructed from cancerous brain tissue 
obtained from a 26-year-old Caucasian male during cerebral meningeal excision following diagnosis of 
grade 4 oligoastrocytoma localized in the right frontoparietal part of the brain. The tumor had been 
30 irradiated (5800 rads). Patient history included hemiplegia, epilepsy, ptosis of eyelid, and common 
migraine, and medications included Dilantin® (Parke-Davis, Morris Plains NJ). 

The frozen tissue was homogenized and lysed using a POLYTRON homogenizer (PT-3000; 
Brinkmann Instruments, Westbury NY) in guanidinium isothiocyanate solutioa The lysate was 
extracted with add phenol, pH 4.7, per Stratagene RNA isolation protocol (Stratagene, San Diego 
35 CA). The RNA was extracted with an equal volume of acid phenol, reprecipitated using 0.3 M sodium 
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acetate and 2.5 volumes of ethanol, resuspended in DEPC-treated water, and treated with DNase for 25 
min at 37°C. The RNA extraction was repeated with phenol, pH 8.7, and precipitated with sodium 
acetate and ethanol as before. The mRNA was isolated with the OLIGOTEX kit (Qiagen, Chatsworth 
CA) and used to construct the cDNA library. 
5 The mRNA was handled according to the recommended protocols in the SUPERSCRIPT 

plasmid system (Life Technologies). cDNAs were fractionated on a SEPHAROSE CL4B column 
(Amersham Pharmacia Biotech), and those cDN As exceeding 400 bp were ligated into PSPORT I 
plasmid (Life Technologies). The plasmid was transformed into DH5a competent cells (Life 
Technologies) to construct the BRAINOT03 library. 

10 II Normalization of the cDNA Library 

4.9 x 1 0 6 independent clones of the BRAINOT03 library were grown in liquid culture under 
carbenicillin (25mg/L) and methicfflin (lmg/ml) selection following transformation by dectroporation 
into DH12S competent cells (Life Technologies). The culture was monitored using a DU-7 
spectrophotometer (Beckman Coulter, Fullerton CA) until it reached an OD600 of 0.2, and then 

15 superinfected with a 5-fold excess of the helper phage M13K07 (Vieira et al. (1987) Methods Enzymol 
153:3-11). 

To reduce the number of highly expressed cDNAs, the library was normalized in a single round 
according to the procedure of Soares et al. (1 994, Proc Natl Acad Sci 91 :9928-9932) with the 
following modifications: 1) the primer to template ratio in the primer extension reaction was increased 

20 from 2:1 to 10:1 , 2) the ddNTP concentration was reduced to 150*iM to allow generation of longer 
(400-1000nt) primer extension products, and 3) the reannealing hybridization was extended from 13 to 
48 hours. After the single stranded DNA circles were purified by hydroxyapatite chromatography and 
converted to partially double-stranded by random priming, the cDNAs were dectroporated into DH10B 
competent bacteria (Life Technologies) to construct the BRAINON01 normalized library. 

25 III Isolation and Sequencing of cDNA Clones 

Plasmid DNA was released from bacterial cells and purified using the REAL Prep 96 plasmid 
kit (Qiagen). This kit enabled the simultaneous purification of 96 samples in a 96-wdl block using 
multi-channel reagent dispensers. The recommended protocol was employed except for the following 
changes: 1) the bacteria were cultured in 1 ml of sterile TERRIFIC BROTH (BD Biosciences, Sparks 

30 MD) with carbenicillin at 25 mg/L and glycerol at 0.4%; 2) the cultures were inoculated, incubated for 
19 hours, and then lysed with 0.3 ml of lysis buffer; and 3) following isopropanol precipitation, the 
plasmid DNA pellet was resuspended in 0.1 ml of distilled water. 

The cDNAs were prepared using a MICROLAB 2200 system (Hamilton, Reno NV) in 
combination with DNA ENGINE thermal cyders (PTC200; MJ Research, Waltham MA). The 

35 cDNAs were sequenced by the method of Sanger and Coulson (1 975; J Mol Biol 94:441 f) using ABI 
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PRISM 377 DNA sequencing systems (Applied Biosystems). Most of the sequences were sequenced 
using standard ABI protocols and kits (Applied Biosystems) at solution volumes of 0.25x - l.Ox. In the 
alternative, some of the sequences were sequenced using solutions and dyes from Amersham Pharmacia 
Biotech. 

5 

IV Selection of Sequences for the Microarray 

Incyte clones were mapped to non-redundant Unigene clusters (Unigene database (build 46), 
NCBI; Shuler (1997) J Mol Med 75:694-698), and the 5' clone with the strongest BLAST alignment 
(at least 90% identity and 100 bp overlap) was chosen, verified, and used in the construction of the 
10 microarray. The UNIGEM V microarray (Incyte Genomics) contains 7075 array elements which 
represent 4610 annotated genes and 2,184 unannotated clusters. Table 1 shows the GenBank 119 
annotations for SEQ ID NOs:l-416 of this invention as produced by BLAST analysis. 

V Homology Searching of Polynucleotides and Proteins 

BLAST involves finding similar segments between the query sequence and a database 

15 sequence, evaluating the statistical significance of any similarities, and reporting only those matches 

that satisfy a user-selectable threshold of significance BLAST produces alignments of both nucleotide 

and amino acid sequences to determine sequence similarity. 

The fundamental unit of the analysis is the High scoring Segment Pair (HSP). AnHSP 

consists of two sequence fragments of arbitrary, but equal lengths, whose alignment is locally maximal 

20 and for which the alignment score meets or exceeds threshold of significance set by the user. 

The basis of the search is the product score, which is defined as: 

% sequence identity x % maximum BLAST score 
100 

The product score takes into account both the degree of identity between two sequences and the 
25 length of the sequence match as reflected in the BLAST score. The BLAST score is calculated by 
scoring +5 for every base that matches in an HSP and -4 for every mismatch. For a product score of 
40, the match will be exact within a 1 % to 2% error and to a product scare of 70, the match will be 
exact Homologous molecules usually show product scores between 15 and 40, although lower scores 
may identify related molecules. The P-value for any given HSP is a function of its expected frequency 
30 of occurrence and the number of HSPs observed against the same database sequence with scores at 
least as high. 

Percent sequence identity is found in a comparison of two or more amino acid or nucleic acid 
sequences. Percent identity can be determined electronically using the MEG ALIGN program, a 
component of LASERGENE software (DNASTAR). The percent similarity between two amino acid 
35 sequences is calculated by dividing the length of sequence A, minus the number of gap residues in 
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sequence A, minus the number of gap residues in sequence B, into the sum of the residue matches 
between sequence A and sequence B, times one hundred. Gaps of low or of no homology between the 
two amino acid sequences are not included in determining percentage similarity. 

Sequences with conserved protein motifs may be searched using the BLOCKS search program. 
5 This program analyses sequence information contained in the Swiss-Prot and PROSITE databases and 
is useful for determining the classification of uncharacterized proteins translated from genomic or 
cDNA sequences (Bairoch et al. (1997) Nucleic Acids Res 25:217-221 ; Attwood et al. (1997) J Chem 
Inf Comput Sci 37:417-424). PROSITE database is a useful source for identifying functional or 
structural domains that are not detected using motifs due to extreme sequence divergence. Using 

10 weight matrices, these domains are calibrated against the SWISS-PROT database to obtain a measure 
of the chance distribution of the matches. 

The PRINTS database can be searched using the BLIMPS search program to obtain protein 
family "fingerprints". The PRINTS database complements the PROSITE database by exploiting 
groups of conserved motifs within sequence alignments to build characteristic signatures of different 

15 protein families. For both BLOCKS and PRINTS analyses, the cutoff scores for local similarity were: 
>1300=strong, 1000-1300=suggestive; for global similarity were: p<exp-3; and for strength (degree of 
correlation) were: >1300=strong, 1000-1300=weak. 
VI Extension of cDNA Clones 

Some of the nucleic acid sequences of the Sequence Listing, designed F, R, or T, were 

20 produced by extension of an appropriate fragment of the original clone insert using oligonucleotide 
primers designed from this fragment One primer was synthesized to initiate 5* extension of the known 
sequence, and the other primer, to initiate 3' extension of the known sequence. The initial primers were 
designed using OLIGO software (Molecular Insights, Cascade CO), or another appropriate program, to 
be about 22 to 30 nucleotides in length, to have a GC content of about 50% or more, and to anneal to 

25 the target sequence at temperatures of about 68°C to about 72°C. Any stretch of nucleotides which 
would result in hairpin structures and primer-primer dimerizations was avoided. 

Selected human cDN A libraries were used to extend the sequence. If more than one extension 
was necessary or desired, additional or nested sets of primers were designed. 

High fidelity amplification was obtained by PCR using methods well known in the art. PCR 

30 was performed in 96-well plates using the DNA ENGINE thermal cycler (MJ Research). The reaction 
mix contained DNA template, 200 nmol of each primer, reaction buffer containing Mg*\ (NHJaSO* 
and f}-mercaptoethanol, Taq DNA polymerase (Amersham Pharmacia Biotech), ELONG ASE enzyme 
(Life Technologies), and Pfu DNA polymerase (Stratagene), with the following parameters for primer 
pair PCI A and PCI B: Step 1: 94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 68°C, 

35 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68°C, 5 min; Step 7: storage at 4°C. In the 
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alternative* the parameters for primer pair T7 and SK+ were as follows: Step 1 : 94°C, 3 min; Step 2: 
94°C, 15 sec; Step 3: 57°C, 1 min; Step 4: 68°C, 2 min; Step 5: Steps 2, 3, and 4 repeated 20 times; 
Step 6: 68°C, 5 min; Step 7: storage at 4°C 

The concentration of DNA in each well was determined by dispensing 100 \i\ PICOGREEN 
5 reagent (0.25% v/v PICOGREEN (Molecular Probes, Eugene OR) dissolved in lx TE) and 0.5 \il of 
undiluted PGR product into each well of an opaque fluorimeter plate (Corning Costar, Acton MA), 
allowing the DNA to bind to the reagent The plate was scanned in a Fluoroskan II (Labsystems Oy, 
Helsinki FT) to measure the fluorescence of the sample and to quantify the concentration of DNA. A 5 
li\ to 10 y\ aliquot of the reaction mixture was analyzed by electrophoresis on a 1 % agarose minigel to 

10 determine which reactions were successful in extending the sequence. 

The extended nucleotides were desalted and concentrated, transferred to 384- well plates, 
digested with CvUI cholera virus endonuclease (Molecular Biology Research, Madison WI), and 
sonicated or sheared prior to religation into pUC 1 8 vector (Amersham Pharmacia Biotech). For 
shotgun sequencing, the digested nucleotides were separated on 0.6% to 0.8% agarose gels, fragments 

15 were excised, and agar digested with AGARACE (Promega). Extended clones were reiigated using T4 
ligase (New England Biolabs, Beverly MA) into pUC 18 vector (Amersham Pharmacia Biotech), 
treated with Pfu DNA polymerase (Stratagene) to fill-in restriction site overhangs, and transfected into 
competent E. coh cells. Transformed cells were selected on antibiotic-containing media, and individual 
colonies were picked and cultured overnight at 37°C in 384-well plates in LB/2x carbeniefflin liquid 

20 media. 

The cells were lysed, and DNA was amplified using Taq DNA polymerase (Amersham 
Pharmacia Biotech) and Pfu DNA polymerase (Stratagene) with the following parameters: Step 1: 
94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 72°C, 2 min; Step 5: steps 2, 3, and 4 
repeated 29 times; Step 6: 72°C, 5 min; Step 7: storage at 4°C. DNA was quantified by PICOGREEN 
25 reagent (Molecular Probes) as described above. Samples with low DNA recoveries were reamplified 
using the conditions described above. Samples were diluted with 20% dimethysulphoxide (1:2 v/v), 
and sequenced using DYENAMIC energy transfer sequencing primers and the DYENAMIC DIRECT 
kit (Amersham Pharmacia Biotech) or the ABI PRISM BIGDYE terminator kit (Applied Biosystems). 

VII mRNA for Target Polynucleotides 

30 The mRNAs cm* tissues far preparing target polynucleotides were obtained from Biochain 

Institute (San Leandro CA), International Institute for Advanced Medicine (Exeter PA), and Oncormed 
(Gaithersburg MD). RNA was extracted from tissue samples using the extraction protocol and 
purification procedures described above. 

VIII Microarray Preparation, Labeling of Targets, and Hybridization Analyses 

35 Substrate Preparation 

Probe polynucleotides were amplified from bacterial vectors by thirty cycles of PCR using 
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primers complementary to vector sequences flanking the insert and purified using SEPHACRYL-400 
beads (Amersham Pharmacia Biotech). Purified polynucleotides were robotically arrayed onto a glass 
microsome slide (Corning Science Products, Corning NY) previously coated with 0.05% aminopropyl 
silane (Sigma- Aldrich) and cured at 1 10°C. The microarray was exposed to UV irradiation in a 
5 STRATAL1NKER UV-crosslinker (Stratagene). 
Target Preparation 

Each mRN A sample, shown in Table 2, was reverse transcribed using MMLV reverse 
transcriptase in the presence of dCTP-Cy3 or dCTP-Cy5 (Amersham Pharmacia Biotech) according to 
standard protocol. After incubation at 37°C, the reaction was stopped with 0.5 M sodium hydroxide, 
10 and RNA was degraded at 85°C. The target polynucleotides were purified using CHROMASPIN 30 
columns (Clontech, Palo Alto C A) and ethanol precipitatioa 
Hybridization 

The hybridization mixture, containing 0.2 mg of each of the Cy3 and Cy5 labeled target 
polynucleotides, was heated to 65°C, and dispensed onto the UNIGEM V microarray (Incyte Genomics) 
15 surface. The microarray was covered with a coverslip and incubated at 60°C C. The microarrays were 
sequentially wasted at 45 °C in moderate stringency buffer (IxSSC and 0.1% SDS) and high stringency 
buffer (O.lxSSC) and dried 
Detection 

A confocal laser microscope was used to detect the fluorescence-labeled hybridization 
20 complexes. Excitation wavelengths were 488 nm for Cy3 and 632 ran for Cy5. Each array was scanned 
twice, one scan pa- fluarophore. The emission maxima was 565 nm for Cy3 and 650 nm for Cy5. The 
emitted light was split into two photomultiplier tube detectors based on wavelength. The output of the 
photomultiplier tube was digitized and displayed as an image, where the signal intensity was represented 
using a linear 20 color transformation, with red representing a high signal and blue a low signal. The 
25 fluorescence signal for each probe was integrated to obtain a numerical value corresponding to the signal 
intensity using GEMTOOLS expression analysis software (Incyte Genomics). 
IX Data Analysis and Results 

Out of the 7075 genes presort on UNIGEM V, 3627 genes or 51 % were expressed at a 
significant level across all 30 tissue samples. Significance was defined as signal to background ratio 
30 exceeding 2.5 and area hybridization exceeding 40% for both probes. All data was transformed so that 
differential gene expression values were Log base 2 scale. 
Analysis of Variance 

For each gene, an ANOVA test was run using the tissue categories as the grouping variable. 
The ANOVA tested whether measurements across samples belonging to known categories were 
35 associated with those categories. ANOVA compares the Variance between (Vb) categories to the 
Variance within (Vw) categories. The ratio of Vb divided by Vw (F ratio) was compared to the F 
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distribution for a population of equal degree of freedom (DF) and the probability of the F ratio was 
returned 
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The null hypothesis states that if the measurement variations between samples are due to 
chance only, the variance within categories and variance between categories should be the same. 

10 Therefore, in the absence of any significant association between gene expression and tissue categories, 
the probability returned by ANOVA is equal to 1. Reciprocally, a strong association between gene 
expression and tissue categories implies that the variance between samples is significantly greater than 
the variance within categories, and therefore the probability returned by ANOVA is small. 
The data for the 340 genes shown in Table 3 was used to produce Table 4 which shows that each gene 

15 selected for annotation scored an ANOVA probability equal or below 10" 5 . 
Gene Annotation 

Since selection criterion imposed that the variances of measurement within tissue categories 
were small (see above), it was acceptable to summarize these measurements as the average of the 
measurements within each tissue category. Furthermore, in order to emphasize differences between 

20 tissue categories for each gene, the differences between tissue averages and all-tissues average were 
computed; formula and values are shown in Table 5. 

Using these differential average values, genes were associated with a primary tissue category 
according to the highest differential average value. A minimum differential average value of 1 .5 was 
required to associate a gene with a tissue category. When possible, genes were associated with a 

25 secondary, tertiary, and even quaternary tissue category according to the second, third, and fourth 
highest differential average values, respectively. 

X Screening Molecules for Specific Binding with the Polynucleotide or Protein 

The polynucleotide or fragments thereof and the protein or portions thereof are labeled with 
32 P-dCTP, Cy3-dCTP, Cy5-dCTP (Amersham Pharmacia Biotech), or BIODIPY or FITC (Molecular 
30 Probes), respectively. Candidate molecules or compounds previously arranged on a substrate are 

incubated in the presence of labeled nucleic or amino acid After incubation under conditions for either a 
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polynucleotide or protein, the substrate is washed, and any position on the substrate retaining label, 
which indicates specific binding or complex formation, is assayed. The binding molecule is identified by 
its arrayed position on the substrate. Data obtained using different concentrations of the nucleic acid or 
protein are used to calculate affinity between the labeled nucleic acid or protein and the bound molecule. 
5 High throughput screening using very small assay volumes ami very small amounts of test compound is 
fully described in Burbaum et al. USPN 5,876,946. 

An patents and publications mentioned in the specification are incorporated herein by 
reference. Various modifications and variations of the described method and system of the invention will 
be apparent to those skilled in the art without departing from the scope and spirit of the invention. 
10 Although the invention has been described in connection with specific preferred embodiments, it should 
be understood that the invention as claimed should not be unduly limited to such specific embodiments. 
Indeed, various modifications of the described modes for carrying out the invention that are obvious to 
those skilled in the field of molecular biology or related fields are intended to be within the scope of the 
following claims. 
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Conditions or Diseases 

type II diabetes 
cancer 

cancer 


ADD, hyperactivity 

cancer 

cancer 

diabetes, asthma 
Alzheimer's 


Cause of Death 
gunshot wound 
intracranial hemorrhage 

accident 

NA 

NA 

accident 
accident 
NA 


accident 
accident 
accident 
accident 

intracranial hemorrhage 

drowning 

accident 

NA 

NA 

closed head iniurv 

accident 
accident 
accident 
accident 
accident 
accident 
accident 
accident 
NA 


Ethnidty/Sex 

C/M 
C/M 

A/M 
C/F 
C/F 
A/F 
A/F 
C/F 
A/F 
A/M 
A/F 
A/F 
C/M 
C/F 
C/M 
C/F 
C/M 
C/F 
A/M 
A/M 
A/M 
A/M 
A/M 
A/M 
A/M 
A/M 
C/M 


fas a;35?? 




Tissue 
Ventricle 
Heart 
Heart 

Skeletal Muscle 

Tibia 

Thigh 

Uterus 

Uterus 

Ovary 


Stomach 

Stomach 

Sm Intestine 

Colon 

Lung 

Lung 

I lino 

Liver 

Liver 

Liver 

Kidney 

Kidney 

Kidney 

Pancreas 

Spleen 

Spleen 

Brain 

Brain 

Striatum 


Source No: 

122 
1822 
B7015 
6986 

376 
4071 
6987 
6988 
1119 


6989 
6990 
6991 
6392 
3779 
2881 
2152 
4209 
4133 
2147 
6993 
6994 
6995 
6996 
6997 
6998 
6999 
7000 
3971 
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Lung 
(3779) 


-1.77 
-1.54 
-1.43 

A 1 A 

-0.14 
-5.02 
-2.38 
-2.51 
-3.80 
-3.83 
-1.32 
-1.43 
-3.31 
-1.00 
-1.81 
-2.23 
-2.38 
-1.07 
-1.14 
-0.38 
-0.85 
-0.93 
-1.54 
-0.38 
0.58 
-0.68 
-0.58 
0.14 
0.14 
-1.00 
-1.96 


Is 

HI 
HI 

CO »-3 


-1.20 -1.58 -1.20 
-1.00 -1.14 -0.93 
-1.38 -1.26 -1.43 
-0.77 -0.26 -0.38 
-5.00 -4.69 -4.86 
-1.96 -2.23 -2.43 
-4.06 -3.87 -4.41 
-6.14 -5.37 -5.49 
-5.09 -4.13 -4.43 
-1.43 -1.58 -1.54 
-0.93 -1.00 -0.49 
-3.61 -3.62 -2.89 
-0.85 -0.58 -0.77 
-0.49 -1.00 -0.93 
-0.49 -1.20 -0.85 
-0.77 -1.81 -0.68 
-0.85 -0.93 -0.38 
-1.68 -1.20 -1.32 
0.85 0.49 0.58 
0.85 0.14 0.26 
-1.07 -0.38 -0.49 
1.00 0.26 0.77 
-1.07 -1.14 -0.58 
-0.26 -0.26 0.00 
0.38 -0.14 0.00 
0.14 -0.26 0.14 
-0.49 -0.38 0.38 
0.00 -0.38 0.14 
0.85 0.00 0.26 
-1.00 -0.93 -0.58 


Stomach Stomach 
(6989) (6990) 


-0.26 -1.J6 
-0.58 -0.49 
-1.14 -1.26 
-0.14 0.14 
-5.22 -5.50 
-2.00 -2.17 
-2.72 -3.28 
-4.65 -4.98 
-3.09 -3.93 
-1.43 -1.58 
-0.26 0.38 
-2.43 -2.74 
-0.77 -0.26 
-0.14 -1.00 
0.38 -1.38 
-0.93 -1.54 
-0.77 -1.20 
-1.00 -1.32 
0.77 -0.85 
1.49 -0.14 
0.00 -0.58 
0.85 -0.68 
0.14 -1.07 
0.49 -0.85 
1.00 -0.49 
0.26 -0.49 
0.49 0.00 
0.49 -0.58 
1.20 -0.38 
0.26 0.14 


Uterus Uterus Ovary 
(6987) (6988) (1119) 


0.58 -0.26 -2.?0 
-1.00 -0.85 -1.32 
-1.00 -1.07 -1.68 
-0.26 -0.26 -0.38 
-4.95 -4,87 -4.92 
-1.38 -1.43 -2.49 
-2.70 -1.72 -2.74 
-4.18 -4.19 -4.67 
-4.28 -4.19 -4.49 
-1.14 -1.00 -0.77 
-1.14 -1.43 -1.96 
-4.02 -4.19 -4.11 
0.14 0.49 -0.26 
1.07 0.26 -1.26 
1.63 1.63 -0.49 
-2.32 -1.89 -1.85 
-0.85 -0.77 -0.26 
-0.85 -0.93 -1.07 
2.10 2.14 0.58 
2.93 2.68 0.14 
2.04 2.07 0.93 
1.96 1.58 0.00 
3.23 2.70 1.68 
2.61 2.66 1.72 
1.81 1.85 1.58 
1.58 1.77 1.38 
2.23 1.81 1.32 
2.51 2.66 2.14 
3.39 3.14 1.14 
2.07 2.43 2.07 


.2 © 

.■9 ^ 

CO ^ w 


3.29 3.23 2.98 
2.54 1.07 1.26 
3.00 1.68 2.63 
1.43 2.32 1.49 
4.22 2.68 3.07 
3.02 3.28 3.29 
2.85 3,23 2,77 
4.15 4.26 3.61 
2.89 2.96 3.12 
2.70 1.93 1.77 
2.87 2.00 2.10 
3.10 3.94 3.81 
2.17 2.00 1.49 
3.42 3.54 2,74 
2.98 2.81 2.83 
3.17 2.93 3.02 
3.52 2.66 2.63 
2.51 1.68 1.54 
-1.20 -1.14 -1.32 
-1.49 -2.07 -2.20 
0.68 0.26 0.00 
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Ventricle Heart Heart 
(122) (1822) (B7015) 


1.63 -0.26 -0.68 
1.49 1.63 1.14 
0.85 2.00 0.14 
1.81 0.77 0.93 
1.93 2.32 2.00 
2.35 2.17 2.17 
0.77 0.77 0.49 
-0.85 -0.77 -1.85 
2.72 -0.26 2.20 
1.26 0.93 0.85 
1.32 1.32 1.07 
1.96 2.14 1.68 
1.85 1.89 1.54 
0.00 -0.14 -0.68 
-0.14 -0.58 -1.00 
0.26 1.63 0.58 
1.26 0.26 0.00 
-0.14 0.26 -0.38 
-1.63 -1.07 -1.68 
-1.68 -1.54 -2.23 
2.07 1.43 0.77 
1.38 0.77 0.77 
-1.32 -1.32 -1.89 
0.00 -0.26 -0.49 
-1.00 -0.26 -0.93 
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0.38 0.00 0.14 
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Clone ID 


224996 
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78783 
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1672467 
2950063 
3288518 

184110 
1368173 
1813409 
58309 
1721744 
1924344 
3176845 
2286809 
1985244 
1570042 
2079906 
2852042 
1319020 
1572555 

782235 
1314882 
1403636 
1968921 
1558081 
2495131 
4049957 
1686585 
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0.26 
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7.89E-10 
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0.09 


64.47 
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1921393 


9.15 


0.21 


43.78 


1.07E-10 
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0.28 
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1 . A plurality of cell and tissue specific polynucleotides selected from SEQ ID NOs:l -41 6 or 
the complement thereof. 

5 2. A subset of the polynucleotides of claim 1 , wherein the subset is selected from at least one 

of the groups consisting of 

a) SEQ ID NOs:209-218 and 1-10, cell specific polynucleotides of heart and fragments 
thereof, 

b) SEQ ID NOs:21 9-249 and 1 1-41 , cell specific polynucleotides of skeletal muscle and 
10 fragments thereof; 

c) SEQ ID NOs:250-251 and 42-43, cell specific polynucleotides of uterus and fragments 
thereof; 

d) SEQ ID NOs:252-256 and 44-48, cell specific polynucleotides of ovary and fragments 
thereof; 

15 e) SEQ^ID NOs:257-263 and 49-55, cell specific polynucleotides of stomach and 

fragments thereof; 

f) SEQ ID NOs:264-283 and 56-75, cell specific polynucleotides of intestine and 
fragments thereof; 

g) SEQ ID NOs:284-293 and 76-85, cell specific polynucleotides of lung and fragments 
20 thereof; 

h) SEQ ID NOs:294-345 and 86-137, cell specific polynucleotides of liver and fragments 
thereof; 

i) SEQ ID NOs:346-356 ami 138-148, cell specific polynucleotides of kidney and 
fragments thereof; 

25 j) SEQ ID NOs:357-374 and 149-166, cell specific polynucleotides of pancreas and 

fragments thereof; and 

k) SEQ ID NOs:375-416 and 167-208, cell specific polynucleotides of brain and 
fragments thereof. 

2. The composition of claim 1 , wherein the polynucleotides are immobilized on a substrate. 
30 3. A high throughput method for detecting expression of a polynucleotide in a sample, the 

method comprising: 

a) hybridizing the polynucleotides of claim 1 with the nucleic acids of the sample under 
condition to form a hybridization complex; and 

b) detecting the hybridization complex, wherein the presence of hybridization complex 
35 indicates expression of the polynucleotide in the sample. 

4. The method of claim 3 wherein the nucleic acids of the sample are amplified prior to 
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hybridizatioa 

5. The method of claim 3 wherein hybridization complex formation indicates the 
differentiation of embryonic stem cells into a tissue selected from the group consisting of brain, heart, 
kidney, liver, lung, muscle or pancreatic tissues. 
5 6. A high throughput method of screening molecules or compounds to identify a ligand, the 

method comprising: 

a) combining the polynucleotides of claim 1 with molecules or compounds under 
conditions to allow specific binding; and 

b) detecting specific binding, thereby identifying a ligand which specifically binds to the 
10 composition. 

7. The method of claim 6 wherein the molecules or compounds are selected from DNA 
molecules, RNA molecules, peptide nucleic acids, mimetics, peptides, and proteins. 

8. An isolated polynucleotide selected from SEQ ID NOs:212, 228, 233, 259, 271, 287, 316- 
319, 324, 370, 379, 380, 383, 410, and 412 or a fragment thereof. 

15 9. The polynucleotides of claim 8 wherein the fragments are SEQ ID NOs:4, 20, 25, 51, 63, 

79, 108-111, 116, 162, 171, 172, 175, 202, and 204, respectively. 

10. An expression vector containing a polynucleotide of claim 8. 

11. A host cell containing the expression vector of claim 10 

12. A method for producing a protein, the method comprising the steps of: 

20 (a) culturing the host cell of claim 1 1 under conditions for the expression of protein; and 

(b) recovering the protein from the host cell culture. 

1 3. A protein produced by the method of claim 12. 

14. A high-throughput method for screening a library of molecules or compounds to identify 
at least one ligand which specifically binds a protein, the method comprising: 

25 (a) combining the protean of claim 1 3 with the library under conditions to allow specific 

binding; and 

(b) detecting specific binding between the protein ami a molecule or compound, thereby 
identifying a ligand which specifically binds the protein. 

15. The method of claim 14 wherein the library is selected from DNA molecules, RNA 
30 molecules, peptide nucleic acids, mimetics, peptides, proteins, agonists, antagonists, antibodies or 

their fragments, immunoglobulins, inhibitors, drug compounds, and pharmaceutical agents. 

16. A method of purifying a ligand from a sample, the method comprising: 

a) combining the protein of claim 1 3 with a sample under conditions to allow specific 
binding; 

35 b) recovering the bound protein; and 

c) separating the protein from the ligand, thereby obtaining purified ligand 
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17. A composition comprising the protein of claim 13 in conjunction with a pharmaceutical 

carrier. 

18. A purified antibody that specifically binds to the protein of claim 13. 
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caaacagaag ctggagaagg agaagagtga gctgaagatg gagactgatg acctcagcag 3780 
taacgcagag gccatttcca aagccaaggg aaaccttgaa aagatgtgcc gctctctaga 3840 
agatcaagtg agtgagctta agaccaagga agaggagcag cagcggctga tcaatgacct 3 900 
cacagcacag agagcgcgcc tgcagacaga agcgggtgaa tattctcgac aattagatga 3960 
gaaagatgct ttagtctctc agctttcaag gagcaagcaa gcatctactc agcagattga 4020 
agagctgaaa catcaactag aggaagaaac taaagccaag aacgccctgg cgcatgccct 4080 
gcagtcttcc cgccacgact gtgacctgct gcgggaacag tatgaggagg agcaggaatc 4140 
caaggccgag ctgcagagag cactgtccaa ggccaacacc gaggttgccc aatggaggac 4200 
caaatacgag acggatgcca tccagcgcac agaggagctg gaggaggcca agaaaaagtt 4260 
ggcccagcgc ctgcaagaag ctgaggaaca tgtagaagct gtgaacgcca aatgtgcttc 4320 
ccttgagaag acgaagcagc ggctccagaa tgaagttgaa gacctcatgc ttgatgtgga 4380 
aaggtctaat gcagcctgtg cagcccttga taagaagcaa aggaactttg acaaggtcct 4440 
atcagaatgg aagcagaagt atgaggaaac tcaggctgaa cttgaggcct cccagaagga 4500 
gtcacgttct cttagcactg agctgttcaa ggtgaagaat gtctatgagg aatccctgga 4560 
tcaactcgaa acgctaagaa gagaaaataa gaacttgcaa caggagattt ctgacctcac 4620 
tgagcagatt gcagagggag gaaagcaaat tcatgaattg gagaaaataa agaagcaagt 4680 
agaacaagag aaatgtgaaa ttcaggctgc tttagaggaa gcagaggcat ctcttgaaca 4740 
tgaagaagga aagattctgc gtatccagct tgagttaaac caagtcaagt ctgaagttga 4800 
tagaaaaatc gcagaaaagg atgaggaaat tgaccagctg aagagaaacc acactagagt 4860 
cgtggagaca atgcagagca cgctggatgc agagattaga agcagaaatg atgctctgag 4920 
agtcaagaag aaaatggaag gagatctgaa tgaaatggaa atccagctga accatgccaa 4980 
tcgcttagct gcagagagtt taaggaacta caggaacacc caaggaatcc tgaaggaaac 5040 
ccagctccac ctggatgatg ctctccgggg ccaggaggac ctcaaggaac agctggcaat 5100 
tgtggagcgc agagccaacc tgctgcaggc tgagatcgag gagctgtggg ccactctgga 5160 
acagacagag agaagcagga aaatcgccga acaggagctc ctggatgcca gtgagcgtgt 5220 
ccagctcctc cacacccaga ataccagtct cattaacacc aagaagaaat tagaaaatga 5280 
cgtttcccaa ctccaaagtg aagtggaaga agtaatccaa gaatcacgca atgcagaaga 5340 
gaaagccaag aaggccatca ctgatgctgc catgatggct gaggagctga agaaggaaca 5400 
ggacaccagc gcccacctgg agcggatgaa gaagaacctg gagcagacgg tgaaggacct 5460 
gcagcatcgt ctagatgagg ccgagcagct ggcgctgaag ggtgggaaga agcagatcca 5520 
gaaactggag gccagggtac gtgagcttga aggagaggtt gaaaatgaac agaaacgtaa 5580 
tgcagaggct gttaaaggtt tacggaaaca tgagcgacga gtaaaagaac tcacctacca 5640 
gactgaagaa gatcgcaaga atgttctcag gctgcaggac ttggtagata aattacaggc 570 a 
gaaggtgaaa tcatacaaga gacaagctga ggaggctgag gaacaatcca atgctaatct 5760 
atctaaattc cgcaaactcc agcatgagct ggaggaggcc gaggaacggg ctgacattgc 5820 
tgagtcccag gtcaacaaat tgcgagtgaa gagccgagag gttcacacaa aaatcagtgc 5880 
agagtaaaca cacctgcctg atgctatcaa gaggctgaag aaaggcacaa aatgtgctat 5940 
ttttggtcac ttgctttatg acgtttattt tcctgttaaa gctgaataaa taaaaactac 6000 
agtaaatgta tacatt 6016. 

<210> 231 
<211> 3454 
<212> DNA 

<213> Homo sapiens 
<220> 

<221> misc_f eature 

<223> Incyte ID No: 1627492. con 

<400> 231 

cctggtcagc gtcccatccc ggtcgggagt tctctccagg cggcacgatg ccgaggaaac 60 
agtgaccctg agcgaagcca agccgggcgg caggtgtggc tttgatagct ggtggtgcca 120 
cttcctggcc ttggatgagc cgtacgcctc tgtaaaccca acttcctcac ctttgaaaca 180 
gctgcctggt tcagcattaa tgaagattag tcagtgacag gcctggtgtg ctgagtccgc 240 
acatagaaga atcaaaaatg tccaaaatgt aactggagag aaagtgggca acttttggga 300 
gtgacttttc cacaggaact tctgcaatgt cccatcaacc tctcagctgc ctcactgaaa 360 
aggaggacag ccccagtgaa agcacaggaa atggaccccc ccacctggcc cacccaaacc 420 
tgggacacgt ttaccccgga ggagctgctg cagcagatga aagagctcct gaccgagaac 480 
caccagctga aagaagccat gaagctaaat aatcaagcca tgaaagggag atttgaggag 540 
ctttcggcct ggacagagaa acagaaggaa gaacgccagt tttttgagat acagagcaaa 600 
gaagcaaaag agcgtctaat ggccttgagt catgagaatg agaaattgaa ggaagagctt 660 
ggaaaactaa aagggaaatc agaaaggtca tctgaggacc ccactgatga ctccaggctt 720 
cccagggccg aagcggagca ggaaaaggac cagctcagga cccaggtggt gaggctacaa 780 
gcagagaagg cagacctgtt gggcatcgtg tctgaactgc agctcaagct gaactccagc 840 
ggctcctcag aagattcctt tgttgaaatt aggatggctg aaggagaagc agaagggtca 900 
gtaaaagaaa tcaagcatag tcctgggccc acgagaacag tctccactgg cacggcattg 960 
tctaaatata ggagcagatc tgcagatggg gccaagaatt acttcgaaca tgaggagtta 1020 
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actgtgagcc agctcctgct gtgcctaagg gaagggaatc agaaggtgga gagacttgaa 1080 

gttgcactca aggaggccaa agaaagagtt tcagattttg aaaagaaaac aagtaatcgt 1140 

tctgagattg aaacccagac agaggggagc acagagaaag agaatgatga agagaaaggc 1200 

ccggagactg ttggaagcga agtggaagca ctgaacctcc aggtgacatc tctgtttaag 1260 

gagcttcaag aggctcatac aaaactcagc gaagctgagc taatgaagaa gagacttcaa 1320 

gaaaagtgtc aggcccttga aaggaaaaat tctgcaattc catcagagtt gaatgaaaag 1380 

caagagcttg tttatactaa caaaaagtta gagctacaag tggaaagcat gctatcagaa 1440 

atcaaaatgg aacaggctaa aacagaggat gaaaagtcca aattaactgt gctacagatg 1500 

acacacaaca agcttcttca agaacataat aatgcattga aaacaattga ggaactaaca 1560 

agaaaagagt cagaaaaagt ggacagggca gtgctgaagg aactgagtga aaaactggaa 1620 

ctggcagaga aggctctggc ttccaaacag ctgcaaatgg atgaaatgaa gcaaaccatt 1680 

gccaagcagg aagaggacct ggaaaccatg accatcctca gggctcagat ggaagtttac 1740 

tgttctgatt ttcatgctga aagagcagcg agagagaaaa ttcatgagga aaaggagcaa 1800 

ctggcattgc agctggcagt tctgctgaaa gagaatgatg ctttcgaaga cggaggcagg 1860 

cagtccttga tggagatgca gagtcgtcat ggggcgagaa caagtgactc tgaccagcag 1920 

gcttaccttg ttcaaagagg agctgaggac agggactggc ggcaacagcg gaatattccg 1980 

at teat tec t gccccaagtg tggagaggtt ctgcctgcct ccgtcttcga aagcatcatt 2040 

ctctttcagc agaactgeca getgeaatge cagttgctcc ttttcctcat gaattttctc 2100 

tctcgctgct ctttcagcat gaaaatcaga acagtaaact tccatctgag ccctgaggat 2160 

ggtcatggtt tccaggtcct cttcctgctt ggcaatggtt tgcttcattt catccatttg 2220 

cagctgtttg gaagecagag ccttctctgc cagttccagt ttttcactca gttccttcag 2280 

cactgccctg tccacttttt ctgactcttt tcttgttagt tcctcaattg ttttcaatgc 2340 

attattatgt tcttgaagaa gcttgttgtg tgtcatctgt agcacagtta atttggactt 2400 

ttcatcctct gttttagect gttccatttt gatttctgat ageatgettt ccacttgtag 2460 

ctctaacttt tgttagtata aacaagctct tgcttttcat tcaactctga tggaattgea 2520 

gaatttttcc tttcaagggc ctgacaccta agaaaggtca agatttcatg aggataaaca 2580 

atgctccttt aegtegcagt ggcttttaaa caaaccccaa ggttatgtac aatataagaa 2640 

ttttagggct gtagggttta tgaatgcaaa atgaaataca taatcctcct tgettcttea 2700 

ctagacctgg ccatatttct gaacccttac gectaaacat aaaaaagtac agatctaagc 2760 

teacatgeag tactgtcctc tttgaataat tgtttaaact agtcatacag atacattttt 2820 

agtatttaaa cacagaccac agtagtaatt agetaaggeg cttctcatgt agtgatgtgt 2880 

ggatatcttt tatattaaaa aaaaagttat aggttctgtc aattgcaaag actttagttt 2940 

ctgacaaaga ttcaataagg gaaaaaggaa gttcccctaa gtcttcttaa agateggaaa 3000' 

gaggatttca atataataca aataaatgac agetatgeae aaagcattaa tgcccagccc 3060* 

attagacata tcatttaaca caatgtatac agaagaaatt acaacacaca atttacaaat 3120 

gecaagacag gctcagagaa ttgaaataaa taccttgccc aacacaaata agttgcagac 3180 

catatgttaa atctaggtct gtttgtccaa agtccttagt tgtagctgee atgttgtatc 3240 

aegttaccat aattctacag agtgttggga attataacct caagaaaaca cacacacctc 3300* 

cgtattgtgc tagaaacatg tttgccaatg aatccaatag aacagcatca acttaggttt 33 60/ 

gctaagtttt gggtaacttc ttacatcaag agaaacaagc aaatgccaaa cgctcaatct 3420 

acatgttagc ctgaaataat taactgattg aaat 3454 

<210> 232 

<211> 1553 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_feature 

<223> Incyte ID No: 3354111CB1 

<400> 232 

cccacgcgtc cgcagccatg gagcagcttc gcgccgccgc ccgtctgcag attgttctgg 60 
gccacctcgg ccgcccctcg geeggggctg tegtagctea tcccacttca gggactattt 120 
cctctgccag tttccatcct caacaattcc agtatactct ggataataat gttctaaccc 180 
tggaacagag aaaattttat gaagaaaatg ggtttctagt aatcaaaaat cttgtacctg 240 
atgecgatat teaaegcttt eggaatgagt ttgaaaaaat ctgcagaaag gaggtgaaac 300 
cattaggatt aacagtaatg agagatgtga ccatttcgaa atccgaatat gctccaagtg 3 60 
agaagatgat cacgaaggtc caggatttcc aggaagataa ggagctcttc agatactgea 420 
ctctccccga gattctgaaa tatgtggagt gcttcactgg acctaatatt atggccatgc 480 
gcacaatgtt gataaacaaa cctccagatt ctggcaagaa gacgtcccgt caccccctgc 540 
accaggacct gcactatttc cccttcaggc ccagcgatct catcgtttgc gectggaegg 600 
cgatggagca catcagccgg aacaaegget gtctggttgt gctcccaggc acacacaagg 660 
gctccctgaa gccccacgat taccccaagt gggagggggg agttaacaaa atgttccacg 720 
ggatccagga ctacgaggaa aacaaggccc gggtgcacct ggtgatggag aagggegaca 780 
ctgttttctt ccatcctttg ctcatccacg gatctggtca gaataaaacc cagggattcc 840 
ggaaggcaat ttcctgccat ttcgccagtg ccgattgcca ctacattgac gtgaagggca 900 
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